explicit feature map
- Asia > Afghanistan > Parwan Province > Charikar (0.05)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- North America > United States > California > Orange County > Anaheim (0.04)
- (2 more...)
Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels
Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. Due to the inherent trade-off between the dimension of the (mapped) feature space and the approximation accuracy, the key problem is to identify promising (explicit) features leading to a satisfactory out-of-sample performance. In this work, we tackle this problem by efficiently choosing such features from multiple kernels in a greedy fashion. Our method sequentially selects these explicit features from a set of candidate features using a correlation metric. We establish an out-of-sample error bound capturing the trade-off between the error in terms of explicit features (approximation error) and the error due to spectral properties of the best model in the Hilbert space associated to the combined kernel (spectral error). The result verifies that when the (best) underlying data model is sparse enough, i.e., the spectral error is negligible, one can control the test error with a small number of explicit features, that can scale poly-logarithmically with data. Our empirical results show that given a fixed number of explicit features, the method can achieve a lower test error with a smaller time cost, compared to the state-of-the-art in data-dependent random features.
- North America > United States > Texas > Brazos County > College Station (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- North America > United States > California > Orange County > Anaheim (0.04)
- (3 more...)
Spherical Random Features for Polynomial Kernels
Compact explicit feature maps provide a practical framework to scale kernel methods to large-scale learning, but deriving such maps for many types of kernels remains a challenging open problem. Among the commonly used kernels for nonlinear classification are polynomial kernels, for which low approximation error has thus far necessitated explicit feature maps of large dimensionality, especially for higher-order polynomials. Meanwhile, because polynomial kernels are unbounded, they are frequently applied to data that has been normalized to unit l2 norm. The question we address in this work is: if we know a priori that data is so normalized, can we devise a more compact map? We show that a putative affirmative answer to this question based on Random Fourier Features is impossible in this setting, and introduce a new approximation paradigm, Spherical Random Fourier (SRF) features, which circumvents these issues and delivers a compact approximation to polynomial kernels for data on the unit sphere. Compared to prior work, SRF features are less rank-deficient, more compact, and achieve better kernel approximation, especially for higher-order polynomials. The resulting predictions have lower variance and typically yield better classification accuracy.
Reviews: Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels
In particular, [1, Algorithm 3] propose an approach for minimization of the expected loss of a linear predictor that aims at finding a good' sparse solution. The main idea of the algorithm from [1] is to iteratively add features by picking a previously unselected feature that amounts to the largest reduction in the expected risk. Then, a linear model is trained using the extended feature representation and afterwards the whole process is repeated. The authors use pretty much the same idea and take a large dictionary of features to represent the data. Following this, they run Algorithm 3 from [1] to pick informative' features and generate a sparse feature representation.
Local Random Feature Approximations of the Gaussian Kernel
Wacker, Jonas, Filippone, Maurizio
A fundamental drawback of kernel-based statistical models is their limited scalability to large data sets, which requires resorting to approximations. In this work, we focus on the popular Gaussian kernel and on techniques to linearize kernel-based models by means of random feature approximations. In particular, we do so by studying a less explored random feature approximation based on Maclaurin expansions and polynomial sketches. We show that such approaches yield poor results when modelling high-frequency data, and we propose a novel localization scheme that improves kernel approximations and downstream performance significantly in this regime. We demonstrate these gains on a number of experiments involving the application of Gaussian process regression to synthetic and real-world data of different data sizes and dimensions.
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Europe > United Kingdom > Wales (0.04)
- Europe > United Kingdom > England (0.04)
- (2 more...)
Leverage Score Sampling for Complete Mode Coverage in Generative Adversarial Networks
Schreurs, Joachim, De Meulemeester, Hannes, Fanuel, Michaël, De Moor, Bart, Suykens, Johan A. K.
Commonly, machine learning models minimize an empirical expectation. As a result, the trained models typically perform well for the majority of the data but the performance may deteriorate on less dense regions of the dataset. This issue also arises in generative modeling. A generative model may overlook underrepresented modes that are less frequent in the empirical data distribution. This problem is known as complete mode coverage. We propose a sampling procedure based on ridge leverage scores which significantly improves mode coverage when compared to standard methods and can easily be combined with any GAN. Ridge Leverage Scores (RLSs) are computed by using an explicit feature map, associated with the next-to-last layer of a GAN discriminator or of a pre-trained network, or by using an implicit feature map corresponding to a Gaussian kernel. Multiple evaluations against recent approaches of complete mode coverage show a clear improvement when using the proposed sampling strategy.
Deep Neural-Kernel Machines
In this chapter we review the main literature related to the recent advancement of deep neural-kernel architecture, an approach that seek the synergy between two powerful class of models, i.e. kernel-based models and artificial neural networks. The introduced deep neural-kernel framework is composed of a hybridization of the neural networks architecture and a kernel machine. More precisely, for the kernel counterpart the model is based on Least Squares Support Vector Machines with explicit feature mapping. Here we discuss the use of one form of an explicit feature map obtained by random Fourier features. Thanks to this explicit feature map, in one hand bridging the two architectures has become more straightforward and on the other hand one can find the solution of the associated optimization problem in the primal, therefore making the model scalable to large scale datasets. We begin by introducing a neural-kernel architecture that serves as the core module for deeper models equipped with different pooling layers. In particular, we review three neural-kernel machines with average, maxout and convolutional pooling layers. In average pooling layer the outputs of the previous representation layers are averaged. The maxout layer triggers competition among different input representations and allows the formation of multiple sub-networks within the same model. The convolutional pooling layer reduces the dimensionality of the multi-scale output representations. Comparison with neural-kernel model, kernel based models and the classical neural networks architecture have been made and the numerical experiments illustrate the effectiveness of the introduced models on several benchmark datasets.
- North America > Canada > Ontario > Toronto (0.14)
- Oceania > Australia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (6 more...)
Low-dimensional Interpretable Kernels with Conic Discriminant Functions for Classification
Ceylan, Gurhan, Birbil, S. Ilker
Kernels are often developed and used as implicit mapping functions that show impressive predictive power due to their high-dimensional feature space representations. In this study, we gradually construct a series of simple feature maps that lead to a collection of interpretable low-dimensional kernels. At each step, we keep the original features and make sure that the increase in the dimension of input data is extremely low, so that the resulting discriminant functions remain interpretable and amenable to fast training. Despite our persistence on interpretability, we obtain high accuracy results even without in-depth hyperparameter tuning. Comparison of our results against several well-known kernels on benchmark datasets show that the proposed kernels are competitive in terms of prediction accuracy, while the training times are significantly lower than those obtained with state-of-the-art kernel implementations.
- Asia > Middle East > Republic of Türkiye > Eskisehir Province > Eskisehir (0.04)
- North America > United States > Wisconsin (0.04)
- Europe > Netherlands > South Holland > Rotterdam (0.04)